Despite the importance of image representations such as histograms oforiented gradients and deep Convolutional Neural Networks (CNN), ourtheoretical understanding of them remains limited. Aiming at filling this gap,we investigate three key mathematical properties of representations:equivariance, invariance, and equivalence. Equivariance studies howtransformations of the input image are encoded by the representation,invariance being a special case where a transformation has no effect.Equivalence studies whether two representations, for example two differentparametrisations of a CNN, capture the same visual information or not. A numberof methods to establish these properties empirically are proposed, includingintroducing transformation and stitching layers in CNNs. These methods are thenapplied to popular representations to reveal insightful aspects of theirstructure, including clarifying at which layers in a CNN certain geometricinvariances are achieved. While the focus of the paper is theoretical, directapplications to structured-output regression are demonstrated too.
展开▼